65 research outputs found

    Designing run-time fault-tolerance using dynamic updates

    No full text
    We present a framework for designing run-time fault-tolerance using dynamic program updates triggered by faults. This is an important problem in the design of autonomous systems as it is often the case that a running program needs to be upgraded to its fault-tolerant version once faults occur. We formally state fault-triggered program updates as a design problem. We then present a sound and complete algorithm that automates the design of fault-triggered updates for replacing a program that does not tolerate faults with a fault-tolerant version thereof at run-time. We also define three classes of fault-triggered dynamic updates that tolerate faults during the update. We demonstrate our approach in the context of a fault-triggered update for the gate controller of a parking lot. © 2007 IEEE

    Designing Run-Time Fault-Tolerance Using Dynamic Updates

    No full text
    We present a framework for designing run-time faulttolerance using dynamic program updates triggered by faults. This is an important problem in the design of autonomous systems as it is often the case that a running program needs to be upgraded to its fault-tolerant version once faults occur. We formally state fault-triggered program updates as a design problem. We then present a sound and complete algorithm that automates the design of faulttriggered updates for replacing a program that does not tolerate faults with a fault-tolerant version thereof at run-time. We also define three classes of fault-triggered dynamic updates that tolerate faults during the update. We demonstrate our approach in the context of a fault-triggered update for the gate controller of a parking lot

    Incremental realization of safety requirements: Non-determinism vs. modularity

    No full text
    This paper investigates the impact of non-determinism and modularity on the complexity of incremental incorporation of safety requirements while preserving liveness (a.k.a. the problem of incremental synthesis). Previous work shows that realizing safety in non-deterministic programs under limited observability is an NP-complete problem (in the state space of the program), where limited observability imposes read restrictions on program components with respect to the local state of other components. In this paper, we present a surprising result that synthesizing safety remains an NP-complete problem even for deterministic programs! The results of this paper imply that non-determinism is not the source of the hardness of synthesizing safety in concurrent programs; instead, limited observability has a major impact on the complexity of realizing safety. We also provide a roadmap for future research on exploiting the benefits of modularization while keeping the complexity of incremental synthesis manageable

    Action-based discovery of satisfying subsets: A distributed method for model correction

    No full text
    Context: Understanding and resolving counterexamples in model checking is a difficult task that often takes a significant amount of resources and many rounds of regression model checking after any fix. As such, it is desirable to have algorithmic methods that correct finite-state models when their model checking for a specific property fails without undermining the correctness of the rest of the properties, called the model correction problem. Objective: The objective of this paper is to mitigate time and space complexity of correction. Method: To achieve the objective, this paper presents a distributed method that solves the model correction problem using the concept of satisfying subsets, where a satisfying subset is a subset of model computations that meets a new property while preserving existing properties. The proposed method automates the elimination of superfluous non-determinism in models of concurrent computing systems, thereby generating models that are correct by construction. Results: We have implemented the proposed method in a distributed software tool, called the Model Corrector (ModCor). Due to the distributed nature of the correction algorithms, ModCor exploits the processing power of computer clusters to mitigate the space and time complexity of correction. Our experimental results are promising as we have used a small cluster of five regular PCs to automatically correct large models (with about 3159 reachable states) in a few hours. Such corrections would have been impossible without using ModCor. Conclusions: The results of this paper illustrate that partitioning finite-state models based on their transition relations and distributing them across a computer cluster facilitates the automated correction of models when their model checking fails. © 2012 Elsevier B.V. All rights reserved

    Diconic addition of failsafe fault-tolerance

    No full text
    We present a divide-and-conquer method, called DiConic, for automatic addition of failsafe fault-tolerance to distributed programs, where a failsafe program guarantees to meet its safety specification even when faults occur. Specifically, instead of adding fault-tolerance to a program as a whole, we separately revise program actions so that the entire program becomes failsafe fault-tolerant. Our DiConic algorithm has the potential to utilize the processing power of a large number of machines working in parallel, thereby enabling automatic addition of failsafe fault-tolerance to distributed programs with a large number of processes. We formulate our DiConic synthesis algorithm in terms of the satisfiability problem and demonstrate our approach for the Byzantine Generals problem and an industrial application. Copyright 2007 ACM

    TPGen: A Self-Stablizing GPU-Based Method for Prime and Test Paths Generation

    No full text
    This paper presents a novel scalable GPU-based method for Test Paths (TPs) and Prime Paths (PPs) Generation, called TPGen, used in structural testing and in test data generation. TPGen outperforms existing methods for PPs and TPs generation in several orders of magnitude, both in time and space efficiency. Improving both time and space efficiency is made possible through devising a new non-contiguous and hierarchical memory allocation method, called Three-level Path Access Method (TPAM), that enables efficient storage of maximal simple paths in memory. In addition to its high time and space efficiency, a major significance of TPGen includes its self-stabilizing design where threads execute in a fully asynchronous and order-oblivious way without using any atomic instructions. TPGen can generate PPs and TPs of structurally complex programs that have an extremely high cyclomatic and Npath complexity

    On the Complexity of Adding Convergence

    No full text
    International audienceThis paper investigates the complexity of designing Self-Stabilizing (SS) distributed programs, where an SS program meets two properties, namely closure and convergence. Convergence requires that, from any state, the computations of an SS program reach a set of legitimate states (a.k.a. invariant). Upon reaching a legitimate state, the computations of an SS program remain in the set of legitimate states as long as no faults occur; i.e., Closure. We illustrate that, in general, the problem of augmenting a distributed program with convergence, i.e., adding convergence, is NP-complete (in the size of its state space). An implication of our NP-completeness result is the hardness of adding nonmasking fault tolerance to distributed programs, which has been an open problem for the past decade

    Locality-Based Relaxation: An Efficient Method for GPU-Based Computation of Shortest Paths

    No full text
    Part 2: Algorithms and ComplexityInternational audienceThis paper presents a novel parallel algorithm for solving the Single-Source Shortest Path (SSSP) problem on GPUs. The proposed algorithm is based on the idea of locality-based relaxation, where instead of updating just the distance of a single vertex v, we update the distances of v’s neighboring vertices up to k steps. The proposed algorithm also implements a communication-efficient method (in the CUDA programming model) that minimizes the number of kernel launches, the number of atomic operations and the frequency of CPU-GPU communication without any need for thread synchronization. This is a significant contribution as most existing methods often minimize one at the expense of another. Our experimental results demonstrate that our approach outperforms most existing methods on real-world road networks of up to 6.3 million vertices and 15 million arcs (on weaker GPUs)
    • …
    corecore